Labeling training data for SVM using Python PLEASE HELP

by: akhauri.yash, 8 years ago


Issue: Providing training data to a SVM for creating an algorithm that gives good buy and sell positions [Stock market trading]. I have High, Low, Open and Close prices, along with Volume and Date. I have also a list of technical/fundamental indicators associated with each Date which i have calculated using my own code: EMA12,26,40, RSI, MACD, ADX, EPS.

Is it appropriate to call the technical/fundamental indicators features?

I need ideas on how to automatically label good buy and sell points in the training data, so that the SVM can associate the indicators with good buy/sell points. I am new to Machine learning with Python and self taught (in process). If my idea is not correct, or unclear, kindly suggest good implementation methodology.

My idea: Creating a code that goes through closing prices of each stock one by one, and if it finds a position with percentage gain greater than R%, it'll label buy and sell positions appropriately.

Problem: It can give a HUGE number of possible buy and sell positions, as there may be many buy and sell positions which range from R% to (R+a)% in that range. I need to make short term trades, maybe a period of 5-6 days. The code might decide to just buy at the lowest, and sell at the highest over the course of my data set (2002-present)

Kindly suggest other ways to code for labeling my training set. Thank you.



You must be logged in to post. Please login or register an account.



You can certainly use indicators as features. It's not certain they'll be of actual use, but you can try.

As for labeling buy and sell points, this is actually very challenging. When I do it, I tend to take the current price, look at the price in some time frame in the future, let's say the next day. If the price is higher the next day, then I label that current price and current list of features as a "buy." If it drops, then it's a sell.

You can also instead say you require price to rise say more then 1% in 5 days for it to be a buy, drop by 1% in 5 days to be a sell....etc. Certainly lots of options, but this is still a major research area.

In the end, having a "huge" number of labeled data is desired.

Your REAL challenge is also balancing your data. Often times you will have many more buys than sells, or more holds than anything else, and the algorithms fit that ratio instead.



-Harrison 8 years ago

You must be logged in to post. Please login or register an account.


Thank you so much for your help. I have decided to start at one point, and if the price returns to the same point in the future, find the highest in that time period, and sell at the highest point if it gain > x (ill set x).
Ill put a stoploss so that if it keeps going down, ill just pull the stocks out and add the losses. I think that will be simple enough.

-akhauri.yash 8 years ago

You must be logged in to post. Please login or register an account.